Remove len() calls from compress paths #520

mykaul · 2025-08-11T15:59:27Z

Pre-review checklist

I have split my patch into logically separate commits.
All commit messages clearly explain what they change and why.
I added relevant tests for new features and bug fixes.
All commits compile, pass static checks and pass test.
PR description sums up the changes and reasons why they should be introduced.
I have provided docstrings for the public items that I want to introduce.
I have adjusted the documentation in ./docs/source/.
I added appropriate Fixes: annotations to PR description.

…code paths. Minor perf. improvement (very) if compression is used - or not: do not call len() for the buffer redundantly. Signed-off-by: Yaniv Kaul <[email protected]>

Copilot

Pull Request Overview

This PR optimizes compression paths by eliminating redundant len() calls on data that will be compressed. The changes store the length of data in variables before compression and pass it as a parameter to compression functions, avoiding multiple length calculations on the same data.

Modified compression function signatures to accept a length parameter
Updated compression logic to pass pre-calculated lengths instead of recalculating after compression
Fixed a typo in a docstring ("one of more" → "one or more")

Reviewed Changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 2 comments.

File	Description
cassandra/connection.py	Updated LZ4 and Snappy compression functions to accept length parameter
cassandra/protocol.py	Modified message encoding to pre-calculate body length and pass to compressor
cassandra/segment.py	Updated segment compression to use pre-calculated lengths and fixed compression logic
tests/unit/test_segment.py	Updated test cases to use new compression function signature

cassandra/connection.py

cassandra/segment.py

In essence, we know the length of the uncompressed payload, so pass it along. To do so, changed the signature of the function (and did the same as dummy for snappy). Signed-off-by: Yaniv Kaul <[email protected]>

mykaul · 2025-08-11T16:59:08Z

To my surprise, very similar code exists in GoCQL....

mykaul · 2025-08-11T17:12:22Z

CI failure looks unrelated (especially as no compression was involved, but who knows):

=================================== FAILURES ===================================
______ LightweightTransactionTests.test_no_connection_refused_on_timeout _______

self = <tests.integration.standard.test_query.LightweightTransactionTests testMethod=test_no_connection_refused_on_timeout>

    def test_no_connection_refused_on_timeout(self):
        """
        Test for PYTHON-91 "Connection closed after LWT timeout"
        Verifies that connection to the cluster is not shut down when timeout occurs.
        Number of iterations can be specified with LWT_ITERATIONS environment variable.
        Default value is 1000
        """
        insert_statement = self.session.prepare("INSERT INTO test3rf.lwt (k, v) VALUES (0, 0) IF NOT EXISTS")
        delete_statement = self.session.prepare("DELETE FROM test3rf.lwt WHERE k = 0 IF EXISTS")
    
        iterations = int(os.getenv("LWT_ITERATIONS", 1000))
    
        # Prepare series of parallel statements
        statements_and_params = []
        for i in range(iterations):
            statements_and_params.append((insert_statement, ()))
            statements_and_params.append((delete_statement, ()))
    
        received_timeout = False
        results = execute_concurrent(self.session, statements_and_params, raise_on_first_error=False)
        for (success, result) in results:
            if success:
                continue
            else:
                # In this case result is an exception
                exception_type = type(result).__name__
                if exception_type == "NoHostAvailable":
                    pytest.fail("PYTHON-91: Disconnected from Cassandra: %s" % result.message)
                if exception_type in ["WriteTimeout", "WriteFailure", "ReadTimeout", "ReadFailure", "ErrorMessageSub"]:
                    if type(result).__name__ in ["WriteTimeout", "WriteFailure"]:
                        received_timeout = True
                    continue
    
                pytest.fail("Unexpected exception %s: %s" % (exception_type, result.message))
    
        # Make sure test passed
>       assert received_timeout
E       assert False

tests/integration/standard/test_query.py:957: AssertionError

Very similar changes to the Python driver (scylladb/python-driver#520 ), there are redunandant calls to len(). Removed them. Follow-up patch will hopefully change the Encode() functions to accept the length of the buffer, to remove another len() call (just as done in the Python change mentioned above). Signed-off-by: Yaniv Kaul <[email protected]>

Very similar changes to the Python driver (scylladb/python-driver#520 ), there are redundant calls to len(). Removed them. Follow-up patch will hopefully change the Encode() functions to accept the length of the buffer, to remove another len() call (just as done in the Python change mentioned above). Signed-off-by: Yaniv Kaul <[email protected]>

dkropachev · 2025-08-11T19:11:25Z

I don't think it makes sense to complicate API by saving couple of CPU ticks, if anything this optimization should be done by python or cython.

mykaul · 2025-08-12T06:05:05Z

I don't think it makes sense to complicate API by saving couple of CPU ticks, if anything this optimization should be done by python or cython.

I'm not sure what API I'm changing here (it's never clear to me what is considered an API in our Python driver - documentation is poor). I think a couple of CPU ticks + improved readability (IMHO) is a good direction.
The compiler is supposed to optimize good code, not mediocre code.

Nevertheless, if you don't agree with the approach, OK. Closing.

roydahan · 2025-08-12T14:14:25Z

We could bypass the "API" breakage, by setting a default and/or handling for missing values for each new parameter that was introduced to the these 2 functions.

dkropachev · 2025-08-12T14:47:56Z

We could bypass the "API" breakage, by setting a default and/or handling for missing values for each new parameter that was introduced to the these 2 functions.

The problem is not in the breaking API, but rather in complicating it without good reason.

Also this API ought to be changed to be streaming.

mykaul · 2025-08-13T08:06:06Z

We could bypass the "API" breakage, by setting a default and/or handling for missing values for each new parameter that was introduced to the these 2 functions.

The problem is not in the breaking API, but rather in complicating it without good reason.

Also this API ought to be changed to be streaming.

I don't fully understand what API you refer to. I don't think I've touched any public API.

(improvement)SegmentCodec - remove redundant len() calls in compress …

c89f2a5

…code paths. Minor perf. improvement (very) if compression is used - or not: do not call len() for the buffer redundantly. Signed-off-by: Yaniv Kaul <[email protected]>

mykaul requested a review from Copilot August 11, 2025 15:59

Copilot AI reviewed Aug 11, 2025

View reviewed changes

cassandra/connection.py Outdated Show resolved Hide resolved

cassandra/segment.py Show resolved Hide resolved

(improvement)remove extra len() call in lz4_compress()

078a131

In essence, we know the length of the uncompressed payload, so pass it along. To do so, changed the signature of the function (and did the same as dummy for snappy). Signed-off-by: Yaniv Kaul <[email protected]>

mykaul force-pushed the remove_len_from_compress branch from 0244246 to 078a131 Compare August 11, 2025 16:06

mykaul mentioned this pull request Aug 11, 2025

(improvement) remove redundant len() calls from compression code scylladb/gocql#519

Merged

mykaul closed this Aug 12, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Remove len() calls from compress paths #520

Remove len() calls from compress paths #520

Uh oh!

mykaul commented Aug 11, 2025

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

mykaul commented Aug 11, 2025

Uh oh!

mykaul commented Aug 11, 2025

Uh oh!

dkropachev commented Aug 11, 2025

Uh oh!

mykaul commented Aug 12, 2025

Uh oh!

roydahan commented Aug 12, 2025

Uh oh!

dkropachev commented Aug 12, 2025

Uh oh!

mykaul commented Aug 13, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Remove len() calls from compress paths #520

Remove len() calls from compress paths #520

Uh oh!

Conversation

mykaul commented Aug 11, 2025

Pre-review checklist

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull Request Overview

Reviewed Changes

Uh oh!

Uh oh!

Uh oh!

mykaul commented Aug 11, 2025

Uh oh!

mykaul commented Aug 11, 2025

Uh oh!

dkropachev commented Aug 11, 2025

Uh oh!

mykaul commented Aug 12, 2025

Uh oh!

roydahan commented Aug 12, 2025

Uh oh!

dkropachev commented Aug 12, 2025

Uh oh!

mykaul commented Aug 13, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants